Develop stream 2024-09-12 #462

NB4444 · 2024-09-12T14:17:13Z

Merged changes from upstream CCCL/thrust 2.4.0
Synchronize rocThrust with Thrust v2.4.0 .
Split the contents of HIPSTDPAR's forwarding header into several implementation headers.
Fixed --benchmark_format from not working in benchmarks.
Added a caching allocator to benchmarks to reduce noise and runtime.

The pipeline fails, because the version of rocPRIM used is not including this commit yet.

Closes #478.

…hen possible

…A backend

This prevents CCCL/thrust's build breakage because of v2.4.0 changes

NB4444 · 2024-09-12T14:27:00Z

@AlexVlx any opinions about the HIPSTDPAR changes?

AlexVlx

Overall LGTM, thanks for doing this! There are a few nits / things I'd like to see changed though, so please consider them. Cheers!

AlexVlx · 2024-09-12T15:54:21Z

thrust/system/hip/hipstdpar/hipstdpar_lib.hpp

+
+            hipError_t status;
+
+            status = rocprim::nth_element(


I'd rather we didn't do things like so, if possible, but rather simply kept the ::std:: overloads as a very thin forwarder to thrust i.e. this should just forward to thrust::nth_element(...), like the other algorithms did.

The reason behind forwarding to rocPRIM's algorithms is that currently rocThrust does not have support for the new Parallel STL algorithms that have been added to rocPRIM, as neither NVIDIA's Thrust nor CUB have implemented those yet.

It'd be somewhat less cognitively challenging to have something like a stub thrust::__nth_element(...) call (we'd have to add the stubs, but you're already splitting into impl headers, no?) with a // TODO: this'll go away once something happens rather than having inline prim for a few algorithms, whilst most others forward to the higher level thrust interface. You'll note that the originals tried to keep away from directly doing any hip* stuff, and trying to do only C++ level handling before forwarding to Thrust. I am concerned with some problematic future, where some algorithms just doing a bunch of rocprim stuff inline because why not, some others keeping the forward to Thrust approach and, possibly, a 3rd category doing something else:) Perhaps I'm the odd person out with this preference / concern, though.

Sorry for the late answer. After discussing it a little bit internally, we are unsure whether adding stubs to rocThrust would be ideal: the interface of Thrust is based on that of the C++ Standard Library, but is not guaranteed to be exactly the same, so adding new API entries to rocThrust without knowing how Thrust would do it means that we may need to change this interface later (when Thrust adds these algorithms), which would be a breaking change.
If the only reason for adding stubs to rocThrust is removing the calls to rocPRIM from hipstdpar, then it's probably not worth the risk. Perhaps you have other worries about this that we are failing to see?

Also, about this:

I am concerned with some problematic future, where some algorithms just doing a bunch of rocprim stuff inline because why not, some others keeping the forward to Thrust approach and, possibly, a 3rd category doing something else:)

I don't think it should be a concern, as the missing algorithms are exclusively being added to rocPRIM.

I guess I don't really follow what the risk is. We're not subcontractors for the Thrust development team, so having additional interfaces wouldn't be a problem, if we were to go in that direction. Furthermore, perhaps the point is lost, but what I am suggesting is to do something along the lines of thrust::__nth_element_placeholder_stub_do_not_use_danger_will_robinson, where the latter holds the current rocprim calls. The definition for this can and should live in the implementation header. It's primarily a matter of symmetry and code consistency, no other algorithm calls Prim inline.

Oh I see, so there would still be calls to rocPRIM from the hipstdpar implementation headers, only that the algorithms themselves would appear as if they call to rocThrust (but they call the stubs), right? If that's the case I see no problem adding this.

Yup, that's it; then, when / if upstream adds these, we can just go and delete the double underscore prefix and the impl. This prevents folks (future maintainers) from having to ask themselves "should I just use PRIM here, or should I call Thrust", and at a glance makes it obvious we always call Thrust from the overloads. Thank you for taking the time to go through this, it's highly appreciated!

Added the changes for this! Please check if it fits your idea for the use of stubs.

AlexVlx · 2024-09-12T15:55:06Z

thrust/system/hip/hipstdpar/hipstdpar_lib.hpp

+
+            hipError_t status;
+
+            status = rocprim::partial_sort(


AlexVlx · 2024-09-12T15:55:46Z

thrust/system/hip/hipstdpar/hipstdpar_lib.hpp

+
+            hipError_t status;
+
+            status = rocprim::partial_sort_copy(nullptr,


AlexVlx · 2024-09-12T16:00:54Z

thrust/system/hip/hipstdpar/impl/batch.hpp

+ *  \brief <tt>Batch operations</tt> implementation detail header for HIPSTDPAR.
+ */
+
+#pragma once


Doing the split is fine, since it aligns with Thrust convention, I guess; however, IMHO it'd be better if the master header (this), which is the object of implicit inclusion / is part of the public interface, would do nothing but include the split headers, as opposed to having any actual functionality. We could also probably guard the interpose_allocations.hpp one with its respective macros. Finally, we probably want to error on the impl headers if they're directly included without __HIPSTDPAR__ being defined, as that would indicate user error.

I have some questions about these suggestions:

IMHO it'd be better if the master header (this), which is the object of implicit inclusion / is part of the public interface, would do nothing but include the split headers, as opposed to having any actual functionality

Do you mean that we should further split the master headers (the ones named after the C++ Standard Library categories for algorithms)?

We could also probably guard the interpose_allocations.hpp one with its respective macros.

What macros do you refer to? IINM the interpose_allocations.hpp is already guarded by the __HIPSTDPAR__ and
__HIPSTDPAR_INTERPOSE_ALLOC__ macros (as it was before splitting hipstdpar_lib.hpp).

Finally, we probably want to error on the impl headers if they're directly included without HIPSTDPAR being defined, as that would indicate user error.

And for this, I don't fully understand why would we need to error here. Originally there was no error handling in case that macro was not defined in the hipstdpar_lib.hpp, so I'm not sure why it would be done for the impl headers. Also, as I had understood it, the forwarding header was not intended to be included directly by users, but implicitly included by the compiler when specifying the --hipstdpar option (which also triggers the definition of the __HIPSTDPAR__ macro). So if this is true (please correct me if not), then it wouldn't make sense to check for the definition of this macro IMHO.

Do you mean that we should further split the master headers (the ones named after the C++ Standard Library categories for algorithms)?

Not quite, what I meant is that the master header would pretty much be a list of #includes for the impl headers, with no declarations / definitions or, really, any substantive code living there. In essence, the most we'd do here is check for environmental macros / possibly declare some future config macros, if necessary. Does that make sense?

What macros do you refer to? IINM the interpose_allocations.hpp is already guarded by the __HIPSTDPAR__ and __HIPSTDPAR_INTERPOSE_ALLOC__ macros (as it was before splitting hipstdpar_lib.hpp).

What I was suggesting as a possible alternative is to check for __HIPSTDPAR_INTERPOSE_ALLOC__ in the master hipstdpar_lib.hpp header and do conditional include of the impl header in there, rather than guarding within it. This ties both to the point above re: what to do with the master header and to the one below about considering error-ing out.

And for this, I don't fully understand why would we need to error here. Originally there was no error handling in case that macro was not defined in the hipstdpar_lib.hpp, so I'm not sure why it would be done for the impl headers. Also, as I had understood it, the forwarding header was not intended to be included directly by users, but implicitly included by the compiler when specifying the --hipstdpar option (which also triggers the definition of the __HIPSTDPAR__ macro). So if this is true (please correct me if not), then it wouldn't make sense to check for the definition of this macro IMHO.

Whilst the vote of confidence is highly appreciated, I'd not take the original header to necessarily be complete / the best way to do things:) To wit, whilst the forwarding header was / is not meant to be included by users, that does not mean users won't do it (accidentally / intentionally). Error-ing out if the macro is not defined i.e. compilation for hipstdpar was not desired would guard against accidents, and will at least force intentional abusers to manually define the macro, which'd help with eventual debugging. I suggested it because I think it might be a time saver / inform users about things potentially going awry. Ultimately, it's up to you if you want to go in this direction.

Not quite, what I meant is that the master header would pretty much be a list of #includes for the impl headers, with no declarations / definitions or, really, any substantive code living there. In essence, the most we'd do here is check for environmental macros / possibly declare some future config macros, if necessary. Does that make sense?

Oh alright, I was confused because the comment was made on impl/batch.hpp. But then, the master header hipstdpar_lib.hpp is already just a macro definition check (__HIPSTDPAR__) and a bunch of includes (for the impl/ headers)

What I was suggesting as a possible alternative is to check for HIPSTDPAR_INTERPOSE_ALLOC in the master hipstdpar_lib.hpp header and do conditional include of the impl header in there, rather than guarding within it. This ties both to the point above re: what to do with the master header and to the one below about considering error-ing out.

Whilst the vote of confidence is highly appreciated, I'd not take the original header to necessarily be complete / the best way to do things:) To wit, whilst the forwarding header was / is not meant to be included by users, that does not mean users won't do it (accidentally / intentionally). Error-ing out if the macro is not defined i.e. compilation for hipstdpar was not desired would guard against accidents, and will at least force intentional abusers to manually define the macro, which'd help with eventual debugging. I suggested it because I think it might be a time saver / inform users about things potentially going awry. Ultimately, it's up to you if you want to go in this direction.

Regarding those two points, the final logic should be then:

Check in hipstdpar_lib.hpp if __HIPSTDPAR__ is defined and include the impl headers if so, except interpose_allocations.hpp which only will be included if __HIPSTDPAR_INTERPOSE_ALLOC__ is also defined.

Inside each implementation header, check if __HIPSTDPAR__ is defined and err if it's not.

Additionally, in interpose_allocations.hpp, check if __HIPSTDPAR_INTERPOSE_ALLOC__ is defined and err if not.

If that sounds good I'll add a commit with the changes

This sounds great, thanks and apologies for adding work.

Also added the err logic discussed before

README.md

Beanavil · 2024-09-13T14:36:33Z

@AlexVlx Hi! I answered your comments about the hipstdpar modifications, as I was the one responsible for those. Thanks for reviewing this :)

Beanavil · 2024-10-22T09:18:26Z

@AlexVlx just to check, are we good with the latest changes done to the hipstdpar headers? Just to know if more changes are needed or we can move on with this

AlexVlx

LGTM for the hipstdpar piece, many thanks for doing this!

stanleytsang-amd · 2024-10-31T20:31:54Z

Apologies for the delay on getting the PR approved. Currently it looks like with the newer mainline images/compiler, there is a build error in rocThrust, but it's a compiler error not a rocThrust error. I will submit a bug report to the compiler team. I am able to build this a with 6.3 compiler and the tests pass in my limited testing, so I will merge this into develop for now.

stanleytsang-amd · 2024-10-31T22:24:21Z

Actually, wait. I'm trying with a different compiler, this time from a 6.2.1 RC2 build, and im getting build errors that seem legitimate:
[ 17%] Building CXX object _deps/thread-building-blocks-build/test/CMakeFiles/conformance_arena_constraints.dir/conformance/conformance_arena_constraints.cpp.o
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:281:52: error: alias must point to a defined variable or function
281 | void *aligned_alloc(size_t alignment, size_t size) __TBB_ALIAS_ATTR_COPY(memalign);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:281:52: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:288:46: error: alias must point to a defined variable or function
288 | void *__libc_calloc(size_t num, size_t size) __TBB_ALIAS_ATTR_COPY(calloc);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:288:46: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:289:54: error: alias must point to a defined variable or function
289 | void *__libc_memalign(size_t alignment, size_t size) __TBB_ALIAS_ATTR_COPY(memalign);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:289:54: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:290:35: error: alias must point to a defined variable or function
290 | void *__libc_pvalloc(size_t size) __TBB_ALIAS_ATTR_COPY(pvalloc);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:290:35: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:291:34: error: alias must point to a defined variable or function
291 | void *__libc_valloc(size_t size) __TBB_ALIAS_ATTR_COPY(valloc);
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
| ^
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:291:34: note: the function or variable specified in an alias must refer to its mangled name
/src/rocThrust/build/_deps/thread-building-blocks-src/src/tbbmalloc_proxy/proxy.cpp:74:56: note: expanded from macro '__TBB_ALIAS_ATTR_COPY'
74 | #define __TBB_ALIAS_ATTR_COPY(name) attribute((alias (#name)))
@Beanavil @NB4444 What ROCm build are you using to compile rocThrust with?

NB4444 · 2024-11-01T13:45:32Z

I had a look at it. We do use 6.2.1 in our CI in which it works. Was able to reproduce this on a machine. This has to do with TBB that is used in the hipstdpar. Did find somewhere that it has to use libstdc++11 or newer for oneTBB to work. Was not able to find a fix. Will discuss with @Beanavil on Monday.

stanleytsang-amd · 2024-11-08T04:57:21Z

@Beanavil I'm not sure if your latest commit to disable TBB tests changes anything, I still seeing the same error while trying to build rocThrust unit tests. What does TBB_TEST control? I do not see its use anywhere. I will upload a docker image where I still see build errors.

Beanavil · 2024-11-08T12:00:49Z

Hi @stanleytsang-amd, the latest two commits do two things:

avoid compiling the hipstdpar tests when the C++ STL implementation used is (probably) incompatible with openTBB
does not allow oneTBB's (internal) tests to be built (because they are built by default), this is because in my local setup I got build errors coming from these tests that really don't need to be built in our use-case

I was also able yesterday to access the pipeline output and the docker images to reproduce the errors, I give more details about what I could debug in my answer to your email.

Beanavil and others added 30 commits September 12, 2024 09:30

Fixed overflow bug for large sizes in thrust::shuffle

9075421

Added definitions of execution space macros

e9397cc

Add missing overloads for thrust::pow

a7a5d20

Refactors thrust::unique_by_key to use cub::DeviceSelect::UniqueByKey

7006599

Fix a typo in thrust-config.cmake

e5dbdaa

Check that thrust::pair is trivially copyable

bd43018

Remove double ignore in discard_iterator.h docs

b7b785e

Replace deprecated _VSTD macro with std

93b72cd

Update mode example to use thrust::unique_count

f3e2676

Ensure that thrust fancy iterators are trivially_copy_constructible w…

44d7369

…hen possible

Use checked allocators in CUB catch2 tests

a32a67c

Refactors thrust::copy_if to use cub::DeviceSelect

b741017

Refactor thrust::[stable_]partition[_copy] to use cub::DevicePartition

158fa53

Fix include of <thrust/random.h> with NVC++

bc6c83b

Cleanup diagnostic handling

489c073

Rework config.h

9f5a3ba

Bump version to 2.4.0

1020a11

Fix issues with ambiguous calls to addressof in thrust::optional

917c255

Try harder to unwrap nested thrust::tuple_of_iterator_references, CUD…

5af1ef7

…A backend

Added missing element from thrust's tuple implementation

bd5228c

Ensure that we can run reduce_by_key with const inputs

099a901

Leave definitions of __host__ and __device__

9508470

This prevents CCCL/thrust's build breakage because of v2.4.0 changes

Patched up CI because of CCCL2.4.0 tests' build failure

6791366

Updated tests and examples for __host__ __device__ use

9fe0b04

Updated CHANGELOG

15a07b0

Added operator to transform_reduce benchmark

158a1e1

Added mem allocator in benchmarks

d0bf50f

Changes for review

aa64ae7

ci: set up sccache

0673125

Added helper functions for choosing between different custom reporter

75c44cf

Beanavil added 2 commits September 12, 2024 09:35

Split hipstdpar_lib.hpp

8aff938

Added relevant information to README and CHANGELOG regarding HIPSTDPAR

e2d548f

NB4444 marked this pull request as ready for review September 12, 2024 14:21

NB4444 requested review from a team, stanleytsang-amd, umfranzw, RobsonRLemos and lawruble13 as code owners September 12, 2024 14:21

AlexVlx requested changes Sep 12, 2024

View reviewed changes

AlexVlx reviewed Sep 12, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

AlexVlx requested a review from afanfa September 12, 2024 17:19

Clarified upstream LLVM offload support

28da1b1

Beanavil added 2 commits September 20, 2024 11:50

Emit error when HIPSTDPAR macros are not defined

ed35b28

Move forwarding calls to rocPRIM to thrust's stubs

5023555

AlexVlx approved these changes Oct 24, 2024

View reviewed changes

Naraenda mentioned this pull request Oct 28, 2024

[Issue]: Build with Clang >= 19 and libc++ fails on _VSTD macro #478

Open

stanleytsang-amd approved these changes Oct 31, 2024

View reviewed changes

Beanavil added 3 commits November 6, 2024 08:12

Fix path to hipstdpar impl headers

e340947

Prevent building hipstdpar tests when no compatible libstdc++ is present

1f060d8

Disable TBB tests build

5fdf870

NB4444 mentioned this pull request Nov 7, 2024

Develop stream 2024-11-07 #486

Open

Merge branch 'develop' into develop-upstream-12-9-2024

5be02c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop stream 2024-09-12 #462

Develop stream 2024-09-12 #462

NB4444 commented Sep 12, 2024 •

edited by Naraenda

Loading

NB4444 commented Sep 12, 2024

AlexVlx left a comment

AlexVlx Sep 12, 2024

Beanavil Sep 13, 2024

AlexVlx Sep 15, 2024

Beanavil Sep 19, 2024

AlexVlx Sep 19, 2024

Beanavil Sep 19, 2024

AlexVlx Sep 19, 2024

Beanavil Sep 20, 2024

AlexVlx Sep 12, 2024

AlexVlx Sep 12, 2024

AlexVlx Sep 12, 2024

Beanavil Sep 13, 2024

AlexVlx Sep 15, 2024

Beanavil Sep 16, 2024

AlexVlx Sep 19, 2024

Beanavil Sep 20, 2024

Beanavil commented Sep 13, 2024

Beanavil commented Oct 22, 2024

AlexVlx left a comment

stanleytsang-amd commented Oct 31, 2024

stanleytsang-amd commented Oct 31, 2024 •

edited

Loading

NB4444 commented Nov 1, 2024

stanleytsang-amd commented Nov 8, 2024

Beanavil commented Nov 8, 2024


		hipError_t status;

		status = rocprim::partial_sort_copy(nullptr,

Develop stream 2024-09-12 #462

Are you sure you want to change the base?

Develop stream 2024-09-12 #462

Conversation

NB4444 commented Sep 12, 2024 • edited by Naraenda Loading

NB4444 commented Sep 12, 2024

AlexVlx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Beanavil commented Sep 13, 2024

Beanavil commented Oct 22, 2024

AlexVlx left a comment

Choose a reason for hiding this comment

stanleytsang-amd commented Oct 31, 2024

stanleytsang-amd commented Oct 31, 2024 • edited Loading

NB4444 commented Nov 1, 2024

stanleytsang-amd commented Nov 8, 2024

Beanavil commented Nov 8, 2024

NB4444 commented Sep 12, 2024 •

edited by Naraenda

Loading

stanleytsang-amd commented Oct 31, 2024 •

edited

Loading